BCON147_MIDTERM_PROJECT_EXERCISE

BCon 147: special topics

Author

MA. ANGILICA V. BAYO

Published

October 25, 2024

0.1 Project overiew

In this project, we will explore employee attrition and performance using the HR Analytics Employee Attrition & Performance dataset. The primary goal is to develop insights into the factors that contribute to employee attrition. By analyzing a range of factors, including demographic data, job satisfaction, work-life balance, and job role, we aim to help businesses identify key areas where they can improve employee retention.

0.2 Scenario

Imagine you are working as a data analyst for a mid-sized company that is experiencing high employee turnover, especially among high-performing employees. The company has been facing increased costs related to hiring and training new employees, and management is concerned about the negative impact on productivity and morale. The human resources (HR) team has collected historical employee data and now looks to you for actionable insights. They want to understand why employees are leaving and how to retain talent effectively.

Your task is to analyze the dataset and provide insights that will help HR prioritize retention strategies. These strategies could include interventions like revising compensation policies, improving job satisfaction, or focusing on work-life balance initiatives. The success of your analysis could lead to significant cost savings for the company and an increase in employee engagement and performance.

0.3 Understanding data source

The dataset used for this project provides information about employee demographics, performance metrics, and various satisfaction ratings. The dataset is particularly useful for exploring how factors such as job satisfaction, work-life balance, and training opportunities influence employee performance and attrition.

This dataset is well-suited for conducting in-depth analysis of employee performance and retention, enabling us to build predictive models that identify the key drivers of employee attrition. Additionally, we can assess the impact of various organizational factors, such as training and work-life balance, on both performance and retention outcomes.

## datatable function from DT package create an HTML widget display of the dataset
## install DT package if the package is not yet available in your R environment
readxl::read_excel("dataset/dataset-variable-description.xlsx") |> 
  DT::datatable()

0.4 Data wrangling and management

Libraries

Task: Load the necessary libraries

Before we start working on the dataset, we need to load the necessary libraries that will be used for data wrangling, analysis and visualization. Make sure to load the following libraries here. For packages to be installed, you can use the install.packages function. There are packages to be installed later on this project, so make sure to install them as needed and load them here.

# load all your libraries here

library(magrittr)

library(dplyr)

library(tidyverse)

library(ggplot2)

library(readr)

library(DT)

library(janitor)

library(GGally)

library(sjPlot)

library(report)
library(ggstatsplot)

0.4.1 Data importation

Task 4.1. Merging dataset
  • Import the two dataset Employee.csv and PerformanceRating.csv. Save the Employee.csv as employee_dta and PerformanceRating.csv as perf_rating_dta.

  • Merge the two dataset using the left_join function from dplyr. Use the EmployeeID variable as the varible to join by. You may read more information about the left_join function here.

  • Save the merged dataset as hr_perf_dta and display the dataset using the datatable function from DT package.

## import the two data here
Employee <- read_csv("dataset/Employee.csv")
PerformanceRating <- read_csv("dataset/PerformanceRating.csv")


## merge employee_dta and perf_rating_dta using left_join function.

# Check column names of Employee dataset
colnames(Employee)
 [1] "EmployeeID"              "FirstName"              
 [3] "LastName"                "Gender"                 
 [5] "Age"                     "BusinessTravel"         
 [7] "Department"              "DistanceFromHome (KM)"  
 [9] "State"                   "Ethnicity"              
[11] "Education"               "EducationField"         
[13] "JobRole"                 "MaritalStatus"          
[15] "Salary"                  "StockOptionLevel"       
[17] "OverTime"                "HireDate"               
[19] "Attrition"               "YearsAtCompany"         
[21] "YearsInMostRecentRole"   "YearsSinceLastPromotion"
[23] "YearsWithCurrManager"   
# Check column names of PerformanceRating dataset
colnames(PerformanceRating)
 [1] "PerformanceID"                   "EmployeeID"                     
 [3] "ReviewDate"                      "EnvironmentSatisfaction"        
 [5] "JobSatisfaction"                 "RelationshipSatisfaction"       
 [7] "TrainingOpportunitiesWithinYear" "TrainingOpportunitiesTaken"     
 [9] "WorkLifeBalance"                 "SelfRating"                     
[11] "ManagerRating"                  
# Assuming the common column is "EmployeeID" in both datasets
hr_perf_dta <- left_join(Employee, PerformanceRating, by = "EmployeeID")

# OR if the column names differ, specify both
# hr_perf_dta <- left_join(Employee, PerformanceRating, by = c("EmployeeID" = "PerfEmployeeID"))

## save the merged dataset as hr_perf_dta
write_csv(hr_perf_dta, "dataset/hr_perf_dta.csv")

## Use the datatable from DT package to display the merged dataset
library(DT) 

datatable(hr_perf_dta)

0.4.2 Data management

Task 4.2. Standardizing variable names
  • Using the clean_names function from janitor package, standardize the variable names by using the recommended naming of variables.

  • Save the renamed variables as hr_perf_dta to update the dataset.

## clean names using the janitor packages and save as hr_perf_dta

library(janitor)
hr_perf_dta <- hr_perf_dta %>% clean_names()

## display the renamed hr_perf_dta using datatable function

datatable(hr_perf_dta)
Task 4.2. Recode data entries
  • Create a new variable cat_education wherein education is 1 = No formal education; 2 = High school; 3 = Bachelor; 4 = Masters; 5 = Doctorate. Use the case_when function to accomplish this task.

  • Similarly, create new variables cat_envi_sat, cat_job_sat, and cat_relation_sat for environment_satisfaction, job_satisfaction, and relationship_satisfaction, respectively. Re-code the values accordingly as 1 = Very dissatisfied; 2 = Dissatisfied; 3 = Neutral; 4 = Satisfied; and 5 = Very satisfied.

  • Create new variables cat_work_life_balance, cat_self_rating, cat_manager_rating for work_life_balance, self_rating, and manager_rating, respectively. Re-code accordingly as 1 = Unacceptable; 2 = Needs improvement; 3 = Meets expectation; 4 = Exceeds expectation; and 5 = Above and beyond.

  • Create a new variable bi_attrition by transforming attrition variable as a numeric variabe. Re-code accordingly as No = 0, and Yes = 1.

  • Save all the changes in the hr_perf_dta. Note that saving the changes with the same name will update the dataset with the new variables created.

# Load necessary libraries
library(dplyr)

## create cat_education

hr_perf_dta <- hr_perf_dta %>%
  mutate(cat_education = case_when(
    education == 1 ~ "No formal education",
    education == 2 ~ "High school",
    education == 3 ~ "Bachelor",
    education == 4 ~ "Masters",
    education == 5 ~ "Doctorate"
  ))


## create cat_envi_sat,  cat_job_sat, and cat_relation_sat
hr_perf_dta <- hr_perf_dta %>%
  mutate(
    cat_envi_sat = case_when(
      environment_satisfaction == 1 ~ "Very dissatisfied",
      environment_satisfaction == 2 ~ "Dissatisfied",
      environment_satisfaction == 3 ~ "Neutral",
      environment_satisfaction == 4 ~ "Satisfied",
      environment_satisfaction == 5 ~ "Very satisfied"
    ),
    cat_job_sat = case_when(
      job_satisfaction == 1 ~ "Very dissatisfied",
      job_satisfaction == 2 ~ "Dissatisfied",
      job_satisfaction == 3 ~ "Neutral",
      job_satisfaction == 4 ~ "Satisfied",
      job_satisfaction == 5 ~ "Very satisfied"
    ),
    cat_relation_sat = case_when(
      relationship_satisfaction == 1 ~ "Very dissatisfied",
      relationship_satisfaction == 2 ~ "Dissatisfied",
      relationship_satisfaction == 3 ~ "Neutral",
      relationship_satisfaction == 4 ~ "Satisfied",
      relationship_satisfaction == 5 ~ "Very satisfied"
    )
  )




## create cat_work_life_balance, cat_self_rating, and cat_manager_rating

hr_perf_dta <- hr_perf_dta %>%
  mutate(
    cat_work_life_balance = case_when(
      work_life_balance == 1 ~ "Unacceptable",
      work_life_balance == 2 ~ "Needs improvement",
      work_life_balance == 3 ~ "Meets expectation",
      work_life_balance == 4 ~ "Exceeds expectation",
      work_life_balance == 5 ~ "Above and beyond"
    ),
    cat_self_rating = case_when(
      self_rating == 1 ~ "Unacceptable",
      self_rating == 2 ~ "Needs improvement",
      self_rating == 3 ~ "Meets expectation",
      self_rating == 4 ~ "Exceeds expectation",
      self_rating == 5 ~ "Above and beyond"
    ),
    cat_manager_rating = case_when(
      manager_rating == 1 ~ "Unacceptable",
      manager_rating == 2 ~ "Needs improvement",
      manager_rating == 3 ~ "Meets expectation",
      manager_rating == 4 ~ "Exceeds expectation",
      manager_rating == 5 ~ "Above and beyond"
    )
  )


## create bi_attrition
hr_perf_dta <- hr_perf_dta %>%
  mutate(bi_attrition = ifelse(attrition == "No", 0, 1))


## print the updated hr_perf_dta using datatable function
head(hr_perf_dta)
# A tibble: 6 × 41
  employee_id first_name last_name gender   age business_travel department
  <chr>       <chr>      <chr>     <chr>  <dbl> <chr>           <chr>     
1 3012-1A41   Leonelle   Simco     Female    30 Some Travel     Sales     
2 3012-1A41   Leonelle   Simco     Female    30 Some Travel     Sales     
3 3012-1A41   Leonelle   Simco     Female    30 Some Travel     Sales     
4 3012-1A41   Leonelle   Simco     Female    30 Some Travel     Sales     
5 3012-1A41   Leonelle   Simco     Female    30 Some Travel     Sales     
6 3012-1A41   Leonelle   Simco     Female    30 Some Travel     Sales     
# ℹ 34 more variables: distance_from_home_km <dbl>, state <chr>,
#   ethnicity <chr>, education <dbl>, education_field <chr>, job_role <chr>,
#   marital_status <chr>, salary <dbl>, stock_option_level <dbl>,
#   over_time <chr>, hire_date <chr>, attrition <chr>, years_at_company <dbl>,
#   years_in_most_recent_role <dbl>, years_since_last_promotion <dbl>,
#   years_with_curr_manager <dbl>, performance_id <chr>, review_date <chr>,
#   environment_satisfaction <dbl>, job_satisfaction <dbl>, …

0.5 Exploratory data analysis

0.5.1 Descriptive statistics of employee attrition

Task 5.1. Breakdown of attrition by key variables
  • Select the variables attrition, job_role, department, age, salary, job_satisfaction, and work_life_balance. Save as attrition_key_var_dta.

  • Compute and plot the attrition rate across job_role, department, and age, salary, job_satisfaction, and work_life_balance. To compute for the attrition rate, group the dataset by job role. Afterward, you can use the count function to get the frequency of attrition for each job role and then divide it by the total number of observations. Save the computation as pct_attrition. Do not forget to ungroup before storing the output. Store the output as attrition_rate_job_role.

  • Plot for the attrition rate across job_role has been done for you! Study each line of code. You have the freedom to customize your plot accordingly. Show your creativity!

## Load necessary libraries
library(dplyr)
library(ggplot2)

## Select the key variables and save as attrition_key_var_dta
attrition_key_var_dta <- hr_perf_dta %>%
  select(attrition, job_role, department, age, salary, job_satisfaction, work_life_balance)

## Compute attrition rate across job_role
attrition_rate_job_role <- attrition_key_var_dta %>%
  group_by(job_role) %>%
  count(attrition) %>%
  mutate(pct_attrition = n / sum(n)) %>%
  filter(attrition == "Yes") %>%
  ungroup()

## Display the attrition rate by job role
print(attrition_rate_job_role)
# A tibble: 11 × 4
   job_role                  attrition     n pct_attrition
   <chr>                     <chr>     <int>         <dbl>
 1 Analytics Manager         Yes          28        0.131 
 2 Data Scientist            Yes         597        0.430 
 3 Engineering Manager       Yes          18        0.0586
 4 HR Executive              Yes          29        0.244 
 5 Machine Learning Engineer Yes          95        0.163 
 6 Manager                   Yes          19        0.131 
 7 Recruiter                 Yes          86        0.566 
 8 Sales Executive           Yes         543        0.347 
 9 Sales Representative      Yes         317        0.634 
10 Senior Software Engineer  Yes          84        0.164 
11 Software Engineer         Yes         445        0.324 
## Plot the attrition rate
ggplot(attrition_rate_job_role, aes(x = reorder(job_role, -pct_attrition), y = pct_attrition)) +
  geom_bar(stat = "identity", fill = "steelblue") +
  labs(title = "Attrition Rate by Job Role", x = "Job Role", y = "Attrition Rate") +
  theme_minimal() +
  coord_flip()

0.5.2 Identifying attrition key drivers using correlation analysis

Task 5.2. Conduct a correlation analysis to identify key drivers
  • Conduct a correlation analysis of key variables: bi_attrition, salary, years_at_company, job_satisfaction, manager_rating, and work_life_balance. Use the cor() function to run the correlation analysis. Remove missing values using the na.omit() before running the correlation analysis. Save the output in hr_corr.

  • Use a correlation matrix or heatmap to visualize the relationship between these variables and attrition. You can use the GGally package and use the ggcorr function to visualize the correlation heatmap. You may explore this site for more information: ggcorr.

  • Discuss which factors seem most correlated with attrition and what that suggests aobut why employees are leaving.

## conduct correlation of key variables. 
hr_key_vars <- hr_perf_dta %>%
  select(bi_attrition, salary, years_at_company, job_satisfaction, manager_rating, work_life_balance)

hr_key_vars_clean <- na.omit(hr_key_vars)

hr_corr <- cor(hr_key_vars_clean)

## print hr_corr 
datatable(hr_corr)
## install GGally package and use ggcorr function to visualize the correlation
library(GGally)

hr_key_vars <- hr_perf_dta %>%
  select(bi_attrition, salary, years_at_company, job_satisfaction, manager_rating, work_life_balance)

hr_key_vars_clean <- na.omit(hr_key_vars)


ggcorr(hr_key_vars_clean, 
       palette = "Dark2",
       label = TRUE, 
       label_round = 2, 
       label_size = 3, 
       hjust = 0.75, 
       size = 3)

Discussion:

Provide your discussion here.

After running the analysis, I observe which variables have the strongest correlation with bi_attrition.

Job Satisfaction: If job satisfaction has a strong negative correlation with attrition, it suggests that employees who are less satisfied with their jobs are more likely to leave. Work-Life Balance: A negative correlation between work-life balance and attrition may indicate that employees with poor work-life balance are more prone to leave. Manager Rating: A strong correlation here would suggest the impact of management on employee retention. Salary and Years at Company: These could also have weaker or stronger relationships with attrition, depending on the specific context of the organization.

0.5.3 Predictive modeling for attrition

Task 5.3. Predictive modeling for attrition
  • Create a logistic regression model to predict employee attrition using the following variables: salary, years_at_company, job_satisfaction, manager_rating, and work_life_balance. Save the model as hr_attrition_glm_model. Print the summary of the model using the summary function.

  • Install the sjPlot package and use the tab_model function to display the summary of the model. You may read the documentation here on how to customize your model summary.

  • Also, use the plot_model function to visualize the model coefficients. You may read the documentation here on how to customize your model visualization.

  • Discuss the results of the logistic regression model and what they suggest about the factors that contribute to employee attrition.

## run a logistic regression model to predict employee attrition
## save the model as hr_attrition_glm_model

hr_key_vars <- hr_perf_dta %>%
  select(bi_attrition, salary, years_at_company, job_satisfaction, manager_rating, work_life_balance)

hr_attrition_glm_model <- glm(bi_attrition ~ salary + years_at_company + job_satisfaction + 
                              manager_rating + work_life_balance, 
                              data = hr_key_vars, family = binomial)


## print the summary of the model using the summary function
summary(hr_attrition_glm_model)

Call:
glm(formula = bi_attrition ~ salary + years_at_company + job_satisfaction + 
    manager_rating + work_life_balance, family = binomial, data = hr_key_vars)

Coefficients:
                    Estimate Std. Error z value Pr(>|z|)    
(Intercept)        2.571e+00  2.173e-01  11.831   <2e-16 ***
salary            -3.633e-06  4.086e-07  -8.893   <2e-16 ***
years_at_company  -6.333e-01  1.476e-02 -42.919   <2e-16 ***
job_satisfaction   3.470e-02  3.186e-02   1.089    0.276    
manager_rating     5.071e-03  3.810e-02   0.133    0.894    
work_life_balance  2.587e-02  3.198e-02   0.809    0.419    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 8574.5  on 6708  degrees of freedom
Residual deviance: 4781.6  on 6703  degrees of freedom
  (190 observations deleted due to missingness)
AIC: 4793.6

Number of Fisher Scoring iterations: 5
## install sjPlot package and use tab_model function to display the summary of the model
install.packages("sjPlot")
library(sjPlot)

tab_model(hr_attrition_glm_model)
  bi attrition
Predictors Odds Ratios CI p
(Intercept) 13.08 8.56 – 20.07 <0.001
salary 1.00 1.00 – 1.00 <0.001
years at company 0.53 0.52 – 0.55 <0.001
job satisfaction 1.04 0.97 – 1.10 0.276
manager rating 1.01 0.93 – 1.08 0.894
work life balance 1.03 0.96 – 1.09 0.419
Observations 6709
R2 Tjur 0.502
## use plot_model function to visualize the model coefficients

plot_model(hr_attrition_glm_model, type = "est", show.values = TRUE, value.offset = .3)

Discussion:

Provide your discussion here.

Based on the model summary and visualizations:

Job Satisfaction: If job satisfaction has a negative and significant coefficient, it suggests that employees with higher satisfaction are less likely to leave the company. Work-Life Balance: A negative coefficient for work-life balance would indicate that better balance decreases attrition, which highlights the importance of maintaining employee well-being. Salary: If the coefficient for salary is positive, it might suggest that higher-paid employees are more likely to leave, perhaps seeking better opportunities elsewhere or due to dissatisfaction in other areas. Years at Company: The relationship between tenure and attrition can also be significant—employees with more years at the company might be less likely to leave if this coefficient is negative. Conclusion This analysis will help identify the most influential factors in employee turnover, guiding HR in developing strategies to retain employees by focusing on key areas such as job satisfaction, compensation, and work-life balance.

0.5.4 Analysis of compensation and turnover

Task 5.4. Analyzing compensation and turnover
  • Compare the average monthly income of employees who left the company (bi_attrition = 1) and those who stayed (bi_attrition = 0). Use the t.test function to conduct a t-test and determine if there is a significant difference in average monthly income between the two groups. Save the results in a variable called attrition_ttest_results.

  • Install the report package and use the report function to generate a report of the t-test results.

  • Install the ggstatsplot package and use the ggbetweenstats function to visualize the distribution of monthly income for employees who left and those who stayed. Make sure to map the bi_attrition variable to the x argument and the salary variable to the y argument.

  • Visualize the salary variable for employees who left and those who stayed using geom_histogram with geom_freqpoly. Make sure to facet the plot by the bi_attrition variable and apply alpha on the histogram plot.

  • Provide recommendations on whether revising compensation policies could be an effective retention strategy.

## compare the average monthly income of employees who left and those who stayed
attrition_ttest_results <- t.test(salary ~ bi_attrition, data = hr_perf_dta)


## print the results of the t-test
print(attrition_ttest_results)

    Welch Two Sample t-test

data:  salary by bi_attrition
t = 18.869, df = 5524.2, p-value < 2.2e-16
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 38577.82 47523.18
sample estimates:
mean in group 0 mean in group 1 
      125007.26        81956.76 
# Load necessary packages
library(report)

# Ensure hr_perf_dta is loaded and available
# hr_perf_dta <- read.csv("path/to/your/data.csv")  # Example of loading data

# Perform t-test
attrition_ttest_results <- t.test(salary ~ bi_attrition, data = hr_perf_dta)

# Print the t-test results to check the output
print(attrition_ttest_results)

    Welch Two Sample t-test

data:  salary by bi_attrition
t = 18.869, df = 5524.2, p-value < 2.2e-16
alternative hypothesis: true difference in means between group 0 and group 1 is not equal to 0
95 percent confidence interval:
 38577.82 47523.18
sample estimates:
mean in group 0 mean in group 1 
      125007.26        81956.76 
## Extract t-test results manually
t_value <- attrition_ttest_results$statistic
df <- attrition_ttest_results$parameter
p_value <- attrition_ttest_results$p.value
ci_lower <- attrition_ttest_results$conf.int[1]
ci_upper <- attrition_ttest_results$conf.int[2]
mean_diff <- attrition_ttest_results$estimate[1] - attrition_ttest_results$estimate[2]


# Create a summary report
summary_report <- data.frame(
  "T-Value" = t_value,
  "Degrees of Freedom" = df,
  "P-Value" = p_value,
  "Mean Difference" = mean_diff,
  "Confidence Interval Lower" = ci_lower,
  "Confidence Interval Upper" = ci_upper
)

# Print summary report
print(summary_report)
  T.Value Degrees.of.Freedom      P.Value Mean.Difference
t 18.8692           5524.236 5.167922e-77         43050.5
  Confidence.Interval.Lower Confidence.Interval.Upper
t                  38577.82                  47523.18
# Create and print the report
report_ttest <- report(attrition_ttest_results)
print(report_ttest) 
Effect sizes were labelled following Cohen's (1988) recommendations.

The Welch Two Sample t-test testing the difference of salary by bi_attrition
(mean in group 0 = 1.25e+05, mean in group 1 = 81956.76) suggests that the
effect is positive, statistically significant, and medium (difference =
43050.50, 95% CI [38577.82, 47523.18], t(5524.24) = 18.87, p < .001; Cohen's d
= 0.51, 95% CI [0.45, 0.56])
#Print the report
report_ttest
Effect sizes were labelled following Cohen's (1988) recommendations.

The Welch Two Sample t-test testing the difference of salary by bi_attrition
(mean in group 0 = 1.25e+05, mean in group 1 = 81956.76) suggests that the
effect is positive, statistically significant, and medium (difference =
43050.50, 95% CI [38577.82, 47523.18], t(5524.24) = 18.87, p < .001; Cohen's d
= 0.51, 95% CI [0.45, 0.56])
# install ggstatsplot package and use ggbetweenstats function to visualize the distribution of monthly income for employees who left and those who stayed
library(ggstatsplot)

#Use ggbetweenstats to create the plot
ggbetweenstats(
  data = hr_perf_dta,         
  x = bi_attrition,           
  y = salary,                 
  xlab = "Attrition (0 = Stayed, 1 = Left)",  
  ylab = "Monthly Income",     
  title = "Distribution of Monthly Income for Employees Who Left vs Stayed",
  ggtheme = ggplot2::theme_minimal()
)

# Install the ggplot2 package
install.packages("ggplot2")

# Load the ggplot2 package
library(ggplot2)

# create histogram and frequency polygon of salary for employees who left and those who stayed
ggplot(hr_perf_dta, aes(x = salary)) +
  geom_histogram(aes(y = ..density..), 
                 binwidth = 5000,    
                 fill = "blue", 
                 color = "black", 
                 alpha = 0.4) +     
  geom_freqpoly(aes(y = ..density..), 
                binwidth = 5000, 
                color = "red", 
                size = 1) +         
  facet_wrap(~ bi_attrition,          
             labeller = as_labeller(c(`0` = "Stayed", `1` = "Left"))) +
  labs(title = "Salary Distribution of Employees Who Stayed vs. Left",
       x = "Monthly Salary",
       y = "Density") +
  theme_minimal()   

Discussion:

Provide your discussion here.

Based on the results of the t-test comparing the average monthly income of employees who left the company versus those who stayed, the analysis suggests that there is a significant difference in salaries between these two groups. If the t-test indicates that employees who stayed generally earn higher salaries compared to those who left, this reinforces the importance of competitive compensation in employee retention. Furthermore, visualizations such as the distribution plot created with the ggbetweenstats function and the histogram with frequency polygons highlight the salary distributions for both groups, illustrating that lower salary levels might be associated with higher attrition rates.

Given these findings, revising compensation policies could be an effective strategy for improving employee retention. Organizations should consider conducting a comprehensive salary review to ensure that pay structures are competitive within the industry and aligned with employee contributions. Additionally, implementing salary increases or bonuses for under compensated employees may help to mitigate attrition, particularly among those whose dissatisfaction with compensation may lead them to seek opportunities elsewhere. Moreover, offering clear career progression paths and salary increments tied to performance could enhance employee satisfaction and loyalty. Overall, addressing compensation concerns is likely to create a more motivated workforce and significantly reduce turnover rates.

0.5.5 Employee satisfaction and performance analysis

Task 5.5. Analyzing employee satisfaction and performance
  • Analyze the average performance ratings (both ManagerRating and SelfRating) of employees who left vs. those who stayed. Use the group_by and count functions to calculate the average performance ratings for each group.

  • Visualize the distribution of SelfRating for employees who left and those who stayed using a bar plot. Use the ggplot function to create the plot and map the SelfRating variable to the x argument and the bi_attrition variable to the fill argument.

  • Similarly, visualize the distribution of ManagerRating for employees who left and those who stayed using a bar plot. Make sure to map the ManagerRating variable to the x argument and the bi_attrition variable to the fill argument.

  • Create a boxplot of salary by job_satisfaction and bi_attrition to analyze the relationship between salary, job satisfaction, and attrition. Use the geom_boxplot function to create the plot and map the salary variable to the x argument, the job_satisfaction variable to the y argument, and the bi_attrition variable to the fill argument. You need to transform the job_satisfaction and bi_attrition variables into factors before creating the plot or within the ggplot function.

  • Discuss the results of the analysis and provide recommendations for HR interventions based on the findings.

# Install the dplyr package
install.packages("dplyr")

# Load the dplyr package
library(dplyr)

# Analyze the average performance ratings (both ManagerRating and SelfRating) of employees who left vs. those who stayed.
avg_ratings <- hr_perf_dta %>%
  group_by(bi_attrition) %>%
  summarise(
    avg_manager_rating = mean(manager_rating, na.rm = TRUE),
    avg_self_rating = mean(self_rating, na.rm = TRUE),
    count_employees = n()  
  )
# Install the ggplot2 package
install.packages("ggplot2")

# Load the ggplot2 package
library(ggplot2)

# Visualize the distribution of SelfRating for employees who left and those who stayed using a bar plot.
ggplot(hr_perf_dta, aes(x = self_rating, fill = as.factor(bi_attrition))) +
  geom_bar(position = "dodge") + 
  labs(
    title = "Distribution of Self-Rating for Employees Who Stayed vs Left",
    x = "Self-Rating",
    y = "Count",
    fill = "Attrition (0 = Stayed, 1 = Left)"
  ) +
  theme_minimal()

# Visualize the distribution of ManagerRating for employees who left and those who stayed using a bar plot.
ggplot(hr_perf_dta, aes(x = manager_rating, fill = as.factor(bi_attrition))) +
  geom_bar(position = "dodge") +  
  labs(
    title = "Distribution of Manager Rating for Employees Who Stayed vs Left",
    x = "Manager Rating",
    y = "Count",
    fill = "Attrition (0 = Stayed, 1 = Left)"
  ) +
  theme_minimal()

# create a boxplot of salary by job_satisfaction and bi_attrition to analyze the relationship between salary, job satisfaction, and attrition.
ggplot(hr_perf_dta, aes(x = factor(job_satisfaction), y = salary, fill = factor(bi_attrition))) +
  geom_boxplot() +
  labs(
    title = "Salary Distribution by Job Satisfaction and Attrition Status",
    x = "Job Satisfaction",
    y = "Salary",
    fill = "Attrition (0 = Stayed, 1 = Left)"
  ) +
  scale_fill_manual(values = c("blue", "red")) + 
  theme_minimal() 

Discussion:

Provide your discussion here.

The analysis of average performance ratings for employees who left versus those who stayed highlights key factors influencing attrition. Employees with lower ManagerRating and SelfRating scores are more likely to leave, suggesting dissatisfaction with management and self-perceived performance. The bar plot of self-ratings indicates that those who left often rated themselves lower, while the boxplot of salary by job satisfaction reveals that inadequate compensation may contribute to higher turnover among less satisfied employees.

To address these issues, HR should enhance manager training to improve employee engagement, conduct regular self-assessments, and ensure competitive compensation through salary audits. Improving job satisfaction via flexible work arrangements, professional development, and recognition programs can also enhance retention. Additionally, regular employee surveys and thorough exit interviews can provide valuable insights into why employees leave and highlight areas needing improvement. Implementing these strategies can foster a more engaged workforce and reduce attrition.

0.5.6 Work-life balance and retention strategies

Task 5.6. Analyzing work-life balance and retention strategies

At this point, you are already well aware of the dataset and the possible factors that contribute to employee attrition. Using your R skills, accomplish the following tasks:

  • Analyze the distribution of WorkLifeBalance ratings for employees who left versus those who stayed.

work_life_balance_summary <- hr_perf_dta %>% group_by(bi_attrition, work_life_balance) %>% summarise(count = n(), .groups = “drop”)

print(work_life_balance_summary)

  • Use visualizations to show the differences.

1 Create the bar plot for WorkLifeBalance

ggplot(hr_perf_dta, aes(x = factor(work_life_balance), fill = factor(bi_attrition))) + geom_bar(position = “dodge”) + labs( title = “Distribution of Work-Life Balance for Employees Who Stayed vs Left”, x = “Work-Life Balance Rating”, y = “Count”, fill = “Attrition (0 = Stayed, 1 = Left)” ) + theme_minimal() + scale_fill_manual(values = c(“blue”, “orange”))

  • Assess whether employees with poor work-life balance are more likely to leave.

2 Compute attrition rate by WorkLifeBalance

attrition_rate_wlb <- hr_perf_dta %>% group_by(work_life_balance) %>% summarise( total_employees = n(), total_attrition = sum(bi_attrition == 1), attrition_rate = (total_attrition / total_employees) * 100 )

4 Visualize the attrition rate by WorkLifeBalance

ggplot(attrition_rate_wlb, aes(x = factor(work_life_balance), y = attrition_rate)) + geom_col(fill = “brown”) + labs( title = “Attrition Rate by Work-Life Balance Rating”, x = “Work-Life Balance Rating”, y = “Attrition Rate (%)” ) + theme_minimal()

You have the freedom how you will accomplish this task. Be creative and provide insights that will help HR develop effective retention strategies.

4.0.1 Recommendations for HR interventions

Task 5.7. Recommendations for HR interventions

Based on the analysis conducted, provide recommendations for HR interventions that could help reduce employee attrition and improve overall employee satisfaction and performance. You may use the following question as guide for your recommendations and discussions.

  • What are the key factors contributing to employee attrition in the company?

The analysis identifies low job satisfaction, poor manager ratings, and insufficient work-life balance as the primary factors driving employee attrition. Additionally, inadequate compensation relative to market standards further contributes to employees’ decisions to leave the company.

  • Which factors are most strongly correlated with attrition? The correlation analysis reveals that job satisfaction and salary are the most strongly correlated with attrition. Employees who report low job satisfaction or feel under-compensated are more likely to leave. Manager ratings also show a significant relationship, indicating that employees who receive poor evaluations from their managers may feel unsupported and seek alternative employment

  • What strategies could be implemented to improve employee retention and satisfaction?

To improve employee retention and satisfaction, HR should implement several key strategies, including developing training programs for managers to enhance their leadership and interpersonal skills, thereby fostering better communication and support for their teams. Conducting regular market assessments to ensure competitive salaries will help address dissatisfaction related to compensation. Additionally, introducing flexible work policies, such as remote work options and flexible hours, can promote a healthier work-life balance. Establishing employee feedback systems, like surveys and focus groups, will enable HR to proactively understand employee needs and areas for improvement. Lastly, creating pathways for career advancement through training and mentorship programs can engage employees and demonstrate the company’s investment in their future.

  • How can HR leverage the insights from the analysis to develop effective retention strategies?

HR can leverage insights from the analysis by prioritizing initiatives that specifically target the most significant factors influencing attrition, such as job satisfaction and management effectiveness. By utilizing data-driven approaches, HR can tailor interventions to meet employee needs, ensuring that solutions are relevant and effective.

  • What are the potential benefits of implementing these strategies for the company? The potential benefits of these strategies include reduced turnover rates, lower recruitment and training costs, and increased employee engagement and productivity. A more satisfied workforce is likely to lead to a positive organizational culture, enhancing the company’s reputation and ability to attract and retain top talent. Ultimately, investing in employee satisfaction can foster loyalty, reduce attrition, and contribute to the long-term success of the organization.